dependent variable
Absolute Neighbour Difference based Correlation Test for Detecting Heteroscedastic Relationships
It is a challenge to detect complicated data relationships thoroughly. Here, we propose a new statistical measure, named the absolute neighbour difference based neighbour correlation coefficient, to detect the associations between variables through examining the heteroscedasticity of the unpredictable variation of dependent variables. Different from previous studies, the new method concentrates on measuring nonfunctional relationships rather than functional or mixed associations. Either used alone or in combination with other measures, it enables not only a convenient test of heteroscedasticity, but also measuring functional and nonfunctional relationships separately that obviously leads to a deeper insight into the data associations. The method is concise and easy to implement that does not rely on explicitly estimating the regression residuals or the dependencies between variables so that it is not restrict to any kind of model assumption. The mechanisms of the correlation test are proved in theory and demonstrated with numerical analyses.
Appendix Table of Contents
There are several key limitations of the MADE algorithm: 1. As mentioned in Section 3.1, the MADE algorithm can only mask neural networks such that they respect the autoregressive property. The non-deterministic MADE masking algorithm presented in Germain et al. [2015], the resulting Proposition 1 formalizes this point. In Section 3.1, we showed that finding the weight masks for each neural network layer is equivalent Figure 7 provides a visual example of the steps performed by Algorithm 1. 's last row, we need the products of the last row of Randomly generated adjacency structures of 15 dimensions. IP gives better objective values when the adjacency matrix is very sparse.
Generative AI as a Linguistic Equalizer in Global Science
Filimonovic, Dragan, Rutzer, Christian, Macher, Jeffrey, Weder, Rolf
These authors contributed equally to this work. For decades, the dominance of English has created a substantial barrier in global science, disadvantaging non-native speakers. The recent rise of generative AI (GenAI) offers a potential technological response to this long-standing inequity. We provide the first large-scale evidence testing whether GenAI acts as a linguistic equalizer in global science. Drawing on 5.65 million scientific articles published from 2021 to 2024, we compare GenAI-assisted and non-assisted publications from authors in non-English-speaking countries. Using text embeddings derived from a pretrained large language model (SciBERT), we measure each publication's linguistic similarity to a benchmark of scientific writing from U.S.-based authors and track stylistic convergence over time. We find significant and growing convergence for GenAI-assisted publications after the release of ChatGPT in late 2022. The effect is strongest for domestic coauthor teams from countries linguistically distant from English. These findings provide large-scale evidence that GenAI is beginning to reshape global science communication by reducing language barriers in research. The rapid rise of generative AI (GenAI) has sparked an important debate regarding its role in science--raising questions of whether it homogenizes writing and erodes authorship norms (1,2) or whether it acts as a "linguistic equalizer" that lowers barriers for non-native English speakers (3,4). This debate is especially salient because English has long dominated global science, which gives native speakers a structural advantage (5-7) by creating larger writing burdens and unique peer review bias risks for researchers from non-Anglophone countries (8-12). As a result, many of these researchers have historically spent time in the U.S. or the UK to learn how to write in English or have hired (expensive) language experts (13, 14). Against this backdrop, the release of ChatGPT in late 2022, a chatbot based on a large language model (LLM), marked a turning point. This widely accessible, low-cost, and human-like tool offers a potential means of reducing longstanding linguistic imbalances (15, 16).
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy of educational objectives.
Biased AI improves human decision-making but reduces trust
Lai, Shiyang, Kim, Junsol, Kunievsky, Nadav, Potter, Yujin, Evans, James
Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.
Sharper Concentration Inequalities for Multi-Graph Dependent Variables
In multi-task learning (MTL) with each task involving graph-dependent data, generalization results of existing theoretical analyses yield a sub-optimal risk bound of $O(\frac{1}{\sqrt{n}})$, where $n$ is the number of training samples.This is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill this gap, this paper proposes a new corresponding Bennett inequality, enabling the derivation of a sharper risk bound of $O(\frac{\log n}{n})$. Specifically, building on the proposed Bennett inequality, we propose a new corresponding Talagrand inequality for the empirical process and further develop an analytical framework of the local Rademacher complexity to enhance theoretical generalization analyses in MTL with multi-graph dependent data. Finally, we apply the theoretical advancements to applications such as Macro-AUC Optimization, demonstrating the superiority of our theoretical results over previous work, which is also corroborated by experimental results.
Real-time Monitoring of Economic Shocks using Company Websites
Koenig, Michael, Rauch, Jakob, Woerter, Martin
Understanding the effects of economic shocks on firms is critical for analyzing economic growth and resilience. We introduce a Web-Based Affectedness Indicator (W AI), a general-purpose tool for real-time monitoring of economic disruptions across diverse contexts. By leveraging Large Language Model (LLM) assisted classification and information extraction on texts from over five million company websites, W AI quantifies the degree and nature of firms' responses to external shocks. Using the COVID-19 pandemic as a specific application, we show that W AI is highly correlated with pandemic containment measures and reliably predicts firm performance. Unlike traditional data sources, W AI provides timely firm-level information across industries and geographies worldwide that would otherwise be unavailable due to institutional and data availability constraints. This methodology offers significant potential for monitoring and mitigating the impact of technological, political, financial, health or environmental crises, and represents a transformative tool for adaptive policy-making and economic resilience. Economic shocks, whether driven by public health crises, technological disruptions, geopolitical conflicts, or climate events, pose significant challenges to businesses and policymakers alike. Timely and accurate monitoring of these shocks is critical for crafting effective responses and enhancing economic resilience. However, traditional methods for measuring the impacts of such disruptions - such as surveys and administrative data - are often limited by costs, time lags, and coverage. In this study, we introduce the Web-Based Affectedness Indicator (W AI), a scalable and cost-effective tool for real-time monitoring of economic disruptions at the firm level. By analyzing textual data from millions of company websites, W AI provides granular insights into how firms experience and respond to external shocks. This 1 methodology overcomes traditional limitations by leveraging ubiquitous online content and state-of-the-art natural language processing (NLP) models to generate a dynamic and comprehensive view of economic affectedness. W AI can provide information on a wide range of challenges, including supply chain disruptions, financial crises, and climate-related shocks.